Search CORE

723 research outputs found

Elucidating the genetics of craniofacial shape

Author: David M. Evans
F Liu
JB Cole
JK Pickrell
JR Shaffer
K Adhikari
L Paternoster
MK Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/03/2018
Field of study

Alterations in craniofacial size and shape are apparent in many monogenic diseases and syndromes, but remarkably little is known about the genetics of face shape within healthy populations. This may be set to change following publication of a study that combines unsupervised hierarchical spectral clustering and canonical correlation analysis to help identify common genetic variants associated with craniofacial shape

Crossref

Explore Bristol Research

University of Queensland eSpace

MGMR: leveraging RNA-Seq population data to optimize expression estimation

Author: A Oshlack
B Li
B Li
B Pasaniuc
C Trapnell
Eran Halperin
JH Bullard
JK Pickrell
KA et al. Frazer
L Pachter
MD Robinson
Ron Shamir
Roye Rozov
SB Montgomery
TP Minka
Publication venue: BioMed Central
Publication date: 01/04/2012
Field of study

Abstract Background RNA-Seq is a technique that uses Next Generation Sequencing to identify transcripts and estimate transcription levels. When applying this technique for quantification, one must contend with reads that align to multiple positions in the genome (multireads). Previous efforts to resolve multireads have shown that RNA-Seq expression estimation can be improved using probabilistic allocation of reads to genes. These methods use a probabilistic generative model for data generation and resolve ambiguity using likelihood-based approaches. In many instances, RNA-seq experiments are performed in the context of a population. The generative models of current methods do not take into account such population information, and it is an open question whether this information can improve quantification of the individual samples Results In order to explore the contribution of population level information in RNA-seq quantification, we apply a hierarchical probabilistic generative model, which assumes that expression levels of different individuals are sampled from a Dirichlet distribution with parameters specific to the population, and reads are sampled from the distribution of expression levels. We introduce an optimization procedure for the estimation of the model parameters, and use HapMap data and simulated data to demonstrate that the model yields a significant improvement in the accuracy of expression levels of paralogous genes. Conclusions We provide a proof of principal of the benefit of drawing on population commonalities to estimate expression. The results of our experiments demonstrate this approach can be beneficial, primarily for estimation at the gene level.</p

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

A complete tool set for molecular QTL discovery and analysis

Author: AA Shabalin
AC Nica
D Welter
H Ongen
HJ Westra
J Ernst
JD Storey
JK Pickrell
M Gutierrez-Arcelus
O Canela-Xandri
P Picotti
PA Hoen
S Purcell
S Waszak
SS Rao
T Lappalainen
WE Kraus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Serveur académique lausannois

University of Dundee Online Publications

Archive ouverte UNIGE

Iron Age and Anglo-Saxon genomes from East England reveal British migration history

Author: A Sajantila
AL Topf
AW Briggs
B Winney
C Capelli
CT O'Dushlaine
D Petts
G Jun
GB Busby
H Eckardt
H Härke
H Jònsson
H Li
H Li
HX Zheng
I Lazaridis
J Hines
J Montgomery
J Novembre
JK Pickrell
M Meyer
M Schubert
ME Weale
MG Thomas
N Patterson
N Rohland
P Balaresque
P Brotherton
P Budd
P Ralph
P Skoglund
PR Staab
S Besenbacher
S Leslie
S Schiffels
T Sundell
W Haak
Wellcome Trust
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

British population history has been shaped by a series of immigrations, including the early Anglo-Saxon migrations after 400 CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences from 10 individuals excavated close to Cambridge in the East of England, ranging from the late Iron Age to the middle Anglo-Saxon period. By analysing shared rare variants with hundreds of modern samples from Britain and Europe, we estimate that on average the contemporary East English population derives 38% of its ancestry from Anglo-Saxon migrations. We gain further insight with a new method, rarecoal, which infers population history and identifies fine-scale genetic ancestry from rare variants. Using rarecoal we find that the Anglo-Saxon samples are closely related to modern Dutch and Danish populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain

CLoK

Crossref

Adelaide Research & Scholarship

PubMed Central

MPG.PuRe

Compression of Structured High-Throughput Sequencing Data

Author: ER Mardis
Fabien Campagne
Frederique Lisacek
H Li
H Li
James T. Robinson
Jill P. Mesirov
JK Pickrell
JR Shearstone
JT Robinson
Kevin C. Dorff
L Skrabanek
M Hsi-Yang Fritz
M Mangone
N Agrawal
N Popitsch
Nyasha Chambwe
SM Kielbasa
TD Wu
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/11/2012
Field of study

Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.National Center for Research Resources (U.S.) (Grant UL1 RR024996)Leukemia & Lymphoma Society of America (Translational Research Program Grant LLS 6304-11)National Institute of Mental Health (U.S.) (R01 MH086883

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Inference of population splits and mixtures from genome-wide allele frequency data

Author: A Keinan
A RoyChoudhury
AL Price
AR Boyko
BM Henn
BM vonHoldt
BS Weir
C Becquet
D Reich
D Reich
D Reich
DH Huson
DJ Lawson
EY Durand
G Bhatia
G Coop
G Hellenthal
G Liti
G McVean
G Nicholson
GM Lathrop
HG Parker
Hua Tang
I Gronau
J Felsenstein
J Felsenstein
J Felsenstein
J Hey
J Novembre
J Novembre
J Novembre
J Sirén
J Sukumaran
JK Pritchard
JK Pritchard
Jonathan K. Pritchard
Joseph K. Pickrell
JZ Li
K Lindblad-Toh
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LL Cavalli-Sforza
LS Kubatko
M Bonhomme
M DeGiorgio
M Jakobsson
M Nei
M Nei
M Rasmussen
MA Beaumont
N Patterson
N Patterson
N Saitou
NA Rosenberg
O François
P Beerli
P Menozzi
P Moorjani
R Nielsen
RE Green
RJ Dyer
RL Cann
RN Gutenkunst
RR Hudson
S Xu
SF Schaffner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In this model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication, and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.comComment: 28 pages, 6 figures in main text. Attached supplement is 22 pages, 15 figures. This is an updated version of the preprint available at http://precedings.nature.com/documents/6956/version/

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

FigShare

Localizing triplet periodicity in DNA and cDNA sequences

Author: AA Tsonis
AWC Liew
D Anastassiou
DL Black
G Gutierrez
I Daubechies
J Epps
J Sanchez
J Tuqan
JK Pickrell
JP Mena-Chalco
K Okamura
Lincoln D Stein
Liya Wang
M Stanke
M Yan
R Lewis
S Tiwari
TP George
WG Fairbrother
WJ Kent
YT Chan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism <it>C. elegans</it>. Results Using both simulated TP signals and the real <it>C. elegans </it>sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT. Conclusions MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.</p

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Cis and Trans Effects of Human Genomic Variants on Gene Expression

Author: A Boyd
AC Nica
AC Nica
AH Talukder
Alfonso Buil
AR Wood
AS Dimas
BE Stranger
BE Stranger
BU Schraml
C Wallace
Christopher David Brown
David M. Evans
DF Conrad
DJ Gaffney
DL Nicolae
DM Greenawalt
Donald F. Conrad
E Grundberg
E Klopocki
EE Schadt
Emmanouil T. Dermitzakis
George Davey Smith
GR Abecasis
H Huang
HJ Westra
J Ding
J Millstein
J Zhu
JD Storey
JE Powell
JK Pickrell
John P. Kemp
Julien Bryois
K Hildner
Karen M. Ho
LA Hindorff
M Gutierrez-Arcelus
M Scutari
Matthew Hurles
ND Miller
NL Barbosa-Morais
Panos Deloukas
RW Jones
SB Montgomery
SB Montgomery
SJ Loughran
Stephen B. Montgomery
Susan Ring
T Lappalainen
TR Insel
W Wang
Y Li
Y Li
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

This work was funded by the Louis-Jeantet Foundation (http://www.jeantet.ch/), the European Research Council (Grant ID: 260927 http://erc.europa.eu/), the Swiss National Foundation (Grant ID: 130342 http://www.snf.ch), NCCR Frontiers In Genetics (http://www.frontiers-in-genetics.org), the UK Medical Research Council (http://www.mrc.ac.uk) and the Wellcome Trust (Grant ID: 092731).

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queen Mary Research Online

Explore Bristol Research

University of Queensland eSpace

Archive ouverte UNIGE

FigShare

Profiling allele-specific gene expression in brains from individuals with autism spectrum disorder reveals preferential minor allele usage.

Author: A Chess
A de la Chapelle
A Gimelbrant
A Hogart
A McKenna
A Mortazavi
AA Adegbola
AL Oberg
AM Veerappa
B DeVeale
B Howie
B Langmead
B Tycko
C Fuchsberger
C Gicquel
C Gregg
C Lee
CA de Leeuw
CCY Wong
Changhoon Lee
Cross-Disorder Group of the Psychiatric Genomics Consortium.
D Kim
Daniel H. Geschwind
DH Geschwind
E Ben-David
E Eden
E Eden
E Gardiner
EH Cook Jr.
Eleazar Eskin
EM Quinn
Eun Yong Kang
EY Kang
F Supek
GT Consortium
GV Kryukov
H Li
HA Scoles
I Iossifov
I Voineagu
J Cavaille
J Cavaille
J Grove
JC Darnell
JF Degner
JK Pickrell
KR Kukurba
M Elsabbagh
M Meguro-Horike
Michael J. Gandal
N Mukherjee
NN Parikshak
NN Parikshak
PH Sudmant
PS Bazeley
R Karlic
S Anders
S Kehr
S Kishore
S Nardone
S Purcell
Schizophrenia Working Group of the Psychiatric Genomics Consortium.
SE Castel
SJ Sanders
SM Weyn-Vanhentenryck
T Hulsen
V Savova
Y Zhang
YE Wu
Publication venue: eScholarship, University of California
Publication date: 01/09/2019
Field of study

One fundamental but understudied mechanism of gene regulation in disease is allele-specific expression (ASE), the preferential expression of one allele. We leveraged RNA-sequencing data from human brain to assess ASE in autism spectrum disorder (ASD). When ASE is observed in ASD, the allele with lower population frequency (minor allele) is preferentially more highly expressed than the major allele, opposite to the canonical pattern. Importantly, genes showing ASE in ASD are enriched in those downregulated in ASD postmortem brains and in genes harboring de novo mutations in ASD. Two regions, 14q32 and 15q11, containing all known orphan C/D box small nucleolar RNAs (snoRNAs), are particularly enriched in shifts to higher minor allele expression. We demonstrate that this allele shifting enhances snoRNA-targeted splicing changes in ASD-related target genes in idiopathic ASD and 15q11-q13 duplication syndrome. Together, these results implicate allelic imbalance and dysregulation of orphan C/D box snoRNAs in ASD pathogenesis

Crossref

eScholarship - University of California

Population structure and genetic history of Tibetan Terriers

Author: A Vaysse
AH Freedman
AJ Sams
AR Boyko
BH Choi
D Morris
DH Huson
DN Irion
E Axelsson
E Paradis
EA Ostrander
G Evanno
G Larson
G Leroy
GD Wang
H Mathiasen
HG Parker
HG Parker
Ino Curik
J Berglund
J Cunliffe
J Plassais
JF Pang
JK Pickrell
JK Pritchard
K Lindblad-Toh
LM Shannon
Mateja Janeš
Minja Zorc
MT Koskinen
NM Kopelman
Peter Dovc
R McQuillan
TD Whitaker
Vlatka Cubric-Curik
X Gou
ZL Ding
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2019
Field of study

International audienceAbstractBackgroundTibetan Terrier is a popular medium-sized companion dog breed. According to the history of the breed, the western population of Tibetan Terriers includes two lineages, Lamleh and Luneville. These two lineages derive from a small number of founder animals from the native Tibetan Terrier population, which were brought to Europe in the 1920s. For almost a century, the western population of Tibetan Terriers and the native population in Tibet were reproductively isolated. In this study, we analysed the structure of the western population of Tibetan Terriers, the original native population from Tibet and of different crosses between these two populations. We also examined the genetic relationships of Tibetan Terriers with other dog breeds, especially terriers and some Asian breeds, and the within-breed structure of both Tibetan Terrier populations.ResultsOur analyses were based on high-density single nucleotide polymorphism (SNP) array (Illumina HD Canine 170 K) and microsatellite (18 loci) genotypes of 64 Tibetan Terriers belonging to different populations and lineages. For the comparative analysis, we used 348 publicly available SNP array genotypes of dogs from other breeds. We found that the western population of Tibetan Terriers and the native Tibetan Terriers clustered together with other Asian dog breeds, whereas all other terrier breeds were grouped into a separate group. We were also able to differentiate the western Tibetan Terrier lineages (Lamleh and Luneville) from the native Tibetan Terrier population.ConclusionsOur results reveal the relationships between the western and native populations of Tibetan Terriers and support the hypothesis that Tibetan Terrier belongs to the group of ancient dog breeds of Asian origin, which are close to the ancestors of the modern dog that were involved in the early domestication process. Thus, we were able to reject the initial hypothesis that Tibetan Terriers belong to the group of terrier breeds. The existence of this native population of Tibetan Terriers at its original location represents an exceptional and valuable genetic resource

Crossref

Edinburgh Research Explorer

Repository of the University of Ljubljana